FlipDial: A Generative Model for Two-Way Visual Dialogue

نویسندگان

Daniela Massiceti

N. Siddharth

Puneet Kumar Dokania

Philip H. S. Torr

چکیده

We present FLIPDIAL, a generative model for Visual Dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FLIPDIAL learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which are diverse and relevant to the image. To do this, FLIPDIAL relies on a simple but surprisingly powerful idea: it uses convolutional neural networks (CNNs) to encode entire dialogues directly, implicitly capturing dialogue context, and conditional VAEs to learn the generative model. FLIPDIAL outperforms the state-of-the-art model in the sequential answering task (1VD) on the VisDial dataset by 5 points in Mean Rank using the generated answers. We are the first to extend this paradigm to full two-way visual dialogue (2VD), where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the previous dialogue that has taken place. The key challenge in Visual Dialogue is thus maintaining a consistent, and natural dialogue w...

متن کامل

Comparison of Bayesian Discriminative and Generative Models for Dialogue State Tracking

In this paper, we describe two dialogue state tracking models competing in the 2012 Dialogue State Tracking Challenge (DSTC). First, we detail a novel discriminative dialogue state tracker which directly estimates slot-level beliefs using deterministic state transition probability distribution. Second, we present a generative model employing a simple dependency structure to achieve fast inferen...

متن کامل

Automatic Colorization of Grayscale Images Using Generative Adversarial Networks

Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...

متن کامل

Dialogue patterns - A visual language for dynamic dialogue

A dynamic dialogue is a conversation in which each participant alternately selects remarks based on a changing world state and in which each remark can change the world state. Dynamic dialogues happen frequently as conversations between a player character (PC) and a non-player character (NPC) in a computer game. When it is the PC’s turn to speak, the current game state is used to filter the sta...

متن کامل

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural ne...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.03803 شماره

صفحات -

تاریخ انتشار 2018

FlipDial: A Generative Model for Two-Way Visual Dialogue

نویسندگان

چکیده

منابع مشابه

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

Comparison of Bayesian Discriminative and Generative Models for Dialogue State Tracking

Automatic Colorization of Grayscale Images Using Generative Adversarial Networks

Dialogue patterns - A visual language for dynamic dialogue

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

عنوان ژورنال:

اشتراک گذاری